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Abstract. Methodological contributions: This paper introduces a family 
of kernels for analyzing (anatomical) trees endowed with vector valued 
measurements made along the tree. While state-of-the-art graph and 
tree kernels use combinatorial tree/graph structure with discrete node 
and edge labels, the kernels presented in this paper can include geo- 
metric information such as branch shape, branch radius or other vector 
valued properties. In addition to being flexible in their ability to model 
different types of attributes, the presented kernels are computationally 
efficient and some of them can easily be computed for large datasets 
(N ~ 10.000) of trees with 30 — 600 branches. Combining the kernels 
with standard machine learning tools enables us to analyze the relation 
between disease and anatomical tree structure and geometry. Experimen- 
tal results: The kernels are used to compare airway trees segmented from 
low-dose CT, endowed with branch shape descriptors and airway wall 
area percentage measurements made along the tree. Using kernelized hy- 
pothesis testing we show that the geometric airway trees are significantly 
differently distributed in patients with Chronic Obstructive Pulmonary 
Disease (COPD) than in healthy individuals. The geometric tree kernels 
also give a significant increase in the classification accuracy of COPD 
from geometric tree structure endowed with airway wall thickness mea- 
surements in comparison with state-of-the-art methods, giving further 
insight into the relationship between airway wall thickness and COPD. 
Software: Software for computing kernels and statistical tests is available 
at http : / / image . diku . dk/ aasa/ software . php 



1 Introduction 



Anatomical trees like blood vessels, dendrites or airways, carry information about 
the organs they are part of, and if we can meaningfully compare anatomical 



trees and measurements made along them, then we can learn more about as- 
pects of disease related to the anatomical trees [111123] . For example, airway 
wall thickness is known to be a biomarker for Chronic Obstructive Pulmonary 
Disease (COPD), and in order to compare airway wall thickness measurements 
in different patients, a typical approach is to compare average airway wall area 
percentage measurements for given airway tree generations of particular sub- 
trees [TUim] - These approaches assume that measurements made in different 
locations of the lung are comparable on a common scale, which is not always the 
case jlOUllj . If we can compare tree structures attributed with measurements 
made along them in a way which respects the structure and geometry of the tree, 
then we can be more robustly compare measurements whose values are location 
sensitive. In this paper we present a family of kernels for comparing anatomi- 
cal trees endowed with vector attributes, and use these to get a more detailed 
understanding of how COPD correlates with airway structure and geometry. 

Related work. Several approaches to statistics on attributed (geometric) trees 
have recently appeared, and some of them were applied to airway trees [7,8, 
[TBIE]. These methods only consider branch length or shape and do not allow 
for using additional measurements along the airway tree, such as branch radius 
or airway wall area percentage. Moreover, these methods are computationally 
expensive [6j , or need a set of leaf labels [8j[T7] , making them less applicable for 
general trees. S0rensen [20] treats the airway tree as a set of attributed branches 
which are matched and then compared using a dissimilarity embedding combined 
with a fc-NN classifier. The matching introduces an additional computational 
cost and makes the approach vulnerable to incorrect matches. 

Kernels are a family of similarity measures equivalent to inner products be- 
tween data points implicitly embedded in a Hilbert space. Kernels are typically 
designed to be computationally fast while discriminative for a given problem, 
and often give nonlinear similarity measures in the original data space. Using the 
Hilbert space, many Euclidean data analysis methods are extended to kernels, 
such as classification [3] or hypothesis testing [5]. Kernels are popular because 
they give computational speed, modeling flexibility and access to linear data 
analysis tools for data with nonlinear behavior. 

There are kernels available for structured data such as strings |5I13[ , trees [22] , 
graphs 3,19,21 and point clouds pQ. The current state-of-the-art graph kernel 
in terms of scalability is the Weisfeiler-Lehman (WL) []15] kernel, which com- 
pares graphs by counting isomorphic labeled subtrees of a particular type and 
"radius" h. The WL scales linearly in h and the number of edges, but the scala- 
bility depends on algorithmic constructions for finite node label sets. Thus, the 
WL kernel, like most fast kernels developed in natural language processing and 
bioinformatics 5,13,22], does not generalize to vector-valued branch attributes. 

Walk- and path based kernels [UGH [21], which reduce to comparing sub- 
walks or -paths of the graphs, are state-of-the-art among kernels which include 
continuous- valued graph attributes. Random walk-type kernels [T1I21] suffer from 
several problems including tottering [15] and high computational cost. The short- 



est path kernel [3] by default only considers path length, and some of the kernels 
developed in this paper can be viewed as extensions of the shortest path kernel. 

Contributions. We develop a family of kernels which are computationally fast 
enough to run on large datasets, and can incorporate any vectorial attributes on 
nodeio e.g., shape or airway wall measurements. Using the kernels in classifica- 
tion and hypothesis testing experiments, we show that classification of COPD 
can be substantially improved by taking geometry into account. This illustrates, 
in particular, that airway wall area percentage measurements made at different 
locations in the airway tree are not comparable on a common scale. 

We compare the developed kernels to state-of-the-art methods. We see, in 
particular, that COPD can also be detected from combinatorial airway tree 
structure using state-of-the-art kernels on tree structure alone, but we show 
that these contain no more information than a branch count kernel, as opposed 
to the geometric tree kernels. 



2 Geometric trees and geometric tree kernels 

Anatomical trees like airways are geometric trees: they consist of both combina- 
torial tree structure and branch geometry (e.g., branch length or shape), where 
continuous changes in the branch geometry can lead to continuous transitions 
in the combinatorial tree structure. In addition to its geometric embedding, a 
geometric tree can be adorned with additional features measured along the tree, 
e.g., airway branch radius, airway wall thickness, airway wall thickness/branch 
radius, airway wall area percentage in an airway cross section, etc. 

Definition 1 A geometric tree is a pair (T, x) where T — (V, E, r) is a combi- 
natorial tree with nodes V, root r and edges E C V x V, and x: V — > K™ is 
an assignment of (geometric) attributes from a vector space K™ to the nodes of 
T, e.g. 3D position or landmark points. An attributed geometric tree is a triple 
(T, x, a) where (T, x) is a geometric tree and a : x{T) — > R d is a map assigning a 
vector valued attribute a(p) £ K d to each point p £ x(T). 

A common strategy for defining kernels on structured data such as trees, 
graphs or strings is based on combining kernels on sub-structures such as strings, 

walks, paths, subtrees or subgraphs [200[13j[T9j[2TJ[22] ' ^' :iese are a ^ mstances 
of the so-called R-convolution kernels by Haussler [12) . We shall use paths in 
trees as building blocks for defining kernels on trees. 

Let (T,x) be a geometric tree. Given vertices Vi,Vj £ V there is a unique 
path 7Tjj from Vi to Vj in the tree, defined by the sequence of visited nodes: 

Try = v l ,p i - 1) {v l ),p { - 2) {v l ),...,w,...,p {2) {v 3 ),p ( - 1) (vj),v j , 

where p^ 1 (v) = v, pW(u) = p{v) is the parent node of v, more generally p^ k \v) = 
p(p( fc-1 )(u)), and w is the highest level common ancestor of Vi and Vj in T. We 

1 Our formulation allows both node and edge attributes, as edge attributes are equiv- 
alent to node attributes on rooted trees: assign each edge attribute to its child node. 



call nij the node-path from Vi to Vj in T and for each j let the node-rootpath 7Tj r 
be the node-path from Vj to the root. 

If the geometric node attributes x(v) : I — »• M. n denote embeddings of the edge 
(v,p(v)) into the ambient space R™, a continuous path Xij : [0, 1] — ► R n can be 
dchned, connecting the embedded nodes x(vi),x(vj) £ K™ along the embedded 
tree x(V) C K™. We call iy the embedded path from ir, to in T. 

Throughout the rest of this section, we shall define different kernels for pairs 
of trees T x and T 2 , where T, = {V u Ei, attributed geometric trees 

(including non-attributed geometric trees as a special case with aj = 1), i = 1,2. 
All kernels defined in this section are positive semidefinite, as they are sums 
of linear and Gaussian kernels composed with known feature maps. This is a 
necessary condition for a kernel to be equivalent to an inner product in a Hilbert 
space P] , needed for the analysis methods used in Sec. El 

2.1 Path-based tree kernels 

All-pairs path kernels. The all-pairs path kernel is a basic path-based tree 
kernel. Given two geometric trees, it is defined as 

K a (T 1 ,T 2 ) = ^2 kp(Pij,Pkl)> ( 2 ) 

{vi,Vj) eVix Vi, 
(v k ,vi) £ V 2 x Va 

where k p is a kernel defined on paths, and pij, pu are paths connecting v% to Vj 
and Ufc to vi in Ti and T 2 , respectively - for instance, Try and 7Tfci, or iy and Xki, 
as defined above. Note that if the path kernel k p is a path length kernel, then the 
all-pairs path kernel is a special case of the shortest path kernel on graphs [5] . 

The kernel k p should take large values on paths that are geometrically similar, 
and small values on paths which are not, giving a measure of the alignment of 
the two tree-paths p^ and pu , making K a an overall assessment of the similarity 
between the two geometric trees Ti and T 2 . The all-pairs path kernel is nice in 
the sense that it takes every possible choice of paths in the trees into account. It 
is, however, expensive: The computational cost is C( | V^| 4 ) ■ 0(k p ), where |V| = 
max{|Vi|, IV2I} and 0(k p ) is the cost of the path kernel k p . 

Rootpath kernels. The computational complexity can be reduced by only 
considering rootpaths, giving a rootpath kernel K r defined as: 

K r (T 1 ,T 2 )= ^2 k p (p lr ,p jr ) (3) 

v i £V 1 ,v j £V 2 

where k p is a path kernel as before, and pi r is the path from u,; to the root r. 
This reduces the computational complexity to 0(\V \ 2 )0(k p ). 

2.2 Path kernels 

The modeling capabilities and computational complexity of the kernels K a and 
K r depend on the choices of path kernel k p and path representation p. 



Landmark point representation of embedded paths. From a shape mod- 
eling point of view, equidistantly sampled landmark points give a reasonable 
representation of a path through the tree. Representing paths by N equidis- 
tantly sampled landmark points Xij £ (R n ) , the path kernel k p — k x is either 
a linear or Gaussian kernel: 



, i (xij,x' kl ) (linear, i.e., dot product) 

MStf.SwJ - 1 »-A||*„-xi,||» ( Gaussian ) (4 > 



for a scaling parameter A which regulates the width of the Gaussian. 

Node-path kernels. The landmark point kernels are expensive to compute (see 
Table [lj. In particular, two embedded tree-paths may have large overlapping 
segments without having a single overlapping equidistantly sampled landmark 
point, as the distance between landmark points depends on the length of the 
entire path. Thus, most landmark points will only appear in one path, giving 
little opportunity for recursive algorithms or dynamic programming that take 
advantage of repetitive structure. To enable such approaches, we use node-paths. 

Assume that tt 1 = [ 7 r 1 (l),7T 1 (2), . . . ,ir 1 (m)} and tt 2 = [tt 2 (1), tt 2 (2), . . . , ir 2 {l)] 
are node-paths in Ti,T%, respectively, as defined above, that is, sequences of 
consecutive nodes EVi in Ti. We define 

k (ir i ff 2 ) = /Ef=i*»(^(^(i)).^(7r 3 (i))) if 1^1 = 1^1 = L (5) 
w \ otherwise 

where k n is a node kernel. In this paper k n {v\, V2) is either a linear kernel with- 
out/with additional attributes a^. 

(x\ (vi ) , x 2 (v 2 )) , (xi(vi), x 2 (v 2 )) (ai (vi ) , a 2 {v 2 )) 

or a Gaussian kernel with/ without attributes a, 

e -\i\\xi(v 1 )-X2(v2)\\ 2 e -Ai||a;i(i>i)-a;2(i'2)|| 2 . g — A 2 ||oi )— 02 (fa) || 2 

where the Gaussian weight parameters are heuristically set to the inverse dimen- 
sion of the relevant feature space, i.e., Ai = i and X 2 = 4. 
Now the node-rootpath tree kernel K r can be rewritten as: 

h I 

K r (T u T 2 ) = J2 E £ £M*iM<)W7r 2 (i))), (6) 
;=i t, lG v 1 ! u 2 ev 2 ' »=i 

where = min { height ( Ti ), i = 1,2}. This can be reformulated as a weighted 
sum of node kernels, giving substantially improved computational complexity: 

Proposition 7 For each I < h, let be the set of vertices at level I inTi. Then 

h 

K r (T 1 ,T 9 )=Y, E E (KM kn {vi,v 2 ), (8) 

1=1 vi€Vl v 2 £V 2 l 



where d V( is an h- dimensional vector whose j coefficient counts the number of 
descendants of Vi at level j in Ti, respectively. The complexity of computing K r 
is 0(hm&xi \V l \ 2 (n + h)). 

When k n is a linear kernel (xi(v±), X2(v2)) or (xi(v±), X2(v2))(ai(vi), 0,2(1)2)), 
the kernel K r can be further decomposed as 



«r(T 1 ,T 2 ) = E I =i<7(Ti,0,7(T 2 ,0>, 



7(^,0 



T,vev! x ^ v ) ® S ( v )> n0 a * ( 9 ) 

J2veV a «( w ) ® x *( v ) ® witn a i 



at total complexity 0(\V\hn) / 0(\V\hnd) (without/with attributes ai). Here, (g> 
denotes the Kronecker product. 

Proof. Eq. (J8)) follows from the fact that the terms k n (v\,V2) in kernel © will 
be counted once for every pair (wi, W2) of descendants of v\ and V2, respectively, 
which are at the same level. The descendant vectors S(vi) for all t>, G Vi can 
be precomputed using dynamical programming at computational cost 0(|V|/i), 
since S(v) — [1, ®p( w )=v$(w)], where © is defined as left aligned addition of 
vectorsj The cost of computing K r is thus 

0(\V\h + hmax\V l \ 2 (h + n)) = 0(hmax \V l \ 2 (n + h)), 

where 0(n) is the cost of computing each node kernel k n (v\,V2). 

To prove ((5J) without attributes Of, let x\(vx), ^2(^2), 6(yi) and 6(1)2) be col- 
umn vectors and use the Kronecker trick ({ai, a,j){bi, bj) — (ft, €3 ai, bj <8> a?)): 



K(T 1 ,T 2 ) = Ej=iE„ ie v< E„ 2 ev; ! ( (5 ( w 2),^(wi))(a;i(wi),X2(f2)) 

= Ei=i(E„ 2e v2 (^("2) <8> ^(^^.E^ev^ ®<K w i)} 
= Eti(7(ri,0,7(T 2 ,0)- 

The total complexity is thus 0(max M |V^|/m) + C?(|V|/i) = 0(\V\hn). Similar 
analysis proves the attributed case. □ 

2.3 Pointcloud kernels 

Anatomical measurements can also be weighed by location using 3-D position 
alone in a pointcloud kernel. The pointcloud kernel does not use the tree struc- 
ture but treats each edges in the tree as a point and compares all points: 

K PC (T 1 ,T 2 )= Mei,e 2 ) (10) 

e 1 £E 1 e 2 £E 2 

where k e is a kernel on attributed edges. We use a Gaussian edge kernel (GPC): 

ke(e U e 2 ) = e-^M^)-^)f^-^\\a(e 1 )-a(e 2 )f^ ^ 



e.g., [a, b, c] © [d, e] = [a + d, b + e, c] . 



The kernel is designed to weight the contribution to the total kernel Kpc of 
the airway wall area percentage kernel value c\ between edges e\ and e 2 by the 
geometric alignment of the same edges, defined by the geometric kernel value c 2 . 

2.4 Baseline kernels 

The kernels presented in this paper are compared to a set of baseline kernels. 
Standard airway wall area percentage measurements are often compared by using 
an average measure over parts of the tree or a vector of average measures in 
chosen generations. We use two baseline airway wall area percentage kernels: 

K AAW %(T u T 2 ) = e-^-^\ (12) 

K Ag Aw%(Ti,T 2 ) = e -ll(a 1 )( 3 - 6 )-(-2)(3- 6 ,ll 2 (13) 

where a, is the average airway wall area percentage averaged over all centerline 
points in the tree, and (<Zj)(3_6) is a 4-dimensional vector of average airway wall 
area percentages averaged over all centerline points in generations 3 — 6 in tree 
Tj. For these kernels (AAW%, AgAW%), linear versions were also computed (i.e. 
e -\\wi-w 2 \\ replaced with (wi,m) 2 )), but the corresponding classification results 
are not reported as they were consistently weaker than the Gaussian kernels. 

Airway segmentation is likely more difficult in diseased as opposed to healthy 
subjects, as also observed by [20 . In order to check whether the number of 
detected branches may be a bias in the studied kernels, we compare our kernels 
to a linear and a Gaussian branchcount kernel (LBC/GBC) defined by 

K LB c(Ti,T 2 ) = 'i{V 1 )-i(V 2 ), K G bc(Ti,T 2 ) = e -H«W-«Wll a . (14) 

The linear kernel LBC is the most natural, since the Hilbert space associated to 
a linear kernel on w G M™ is just M™. However, a linear kernel on 1-dimensional 
input cannot be normalized, as (|15[) produces a kernel matrix with entries = 1, 
and the GBC kernel is used for comparison in Table 2] to show that the geometric 
tree kernels are, indeed, measuring something other than branch count. 





Embedded paths 
(m landmark points) 


Node-path 


All-paths 


0(\V\ 4 mn) 


OQVfhmaxi \V l \ 2 (n + h) 


Root-paths 


0(\V\ 2 mn) 


0{hmax t \V l \' 2 )(n + h) 


Attributed all-paths 


N/A 


0(h\V\ J maxi \V l \ 2 )(n + d + h) 


Attributed root-paths 


N/A 


0(hmaxi \V l \ 2 )(n + d + h) 


Attributed linear root-paths 


N/A 


0(\V\hnd) 


Pointcloud kernel 


N/A 


0{\V\ 2 nd) 



Table 1. Computational complexities for the considered kernels. Trees are assumed to 
be embedded in K" and admit additional vector valued measurements in R d . 



Kernel 


Linear 
root-node- 
path 


Gaussian 
branchcount 


average 
AW % 


average 
generation 
AW % 


Shortest 
path 


Weisfeiler 
Lehman 
(h = 10) 


Comp. time 


46 m 43 s 


23 m 3 s 


0.87 s 


1.61 s 


42 m 26 s 


59 m 23 s 



Table 2. Runtime for selected kernels on a larger set of 9710 airway trees. 



Several state-of-the-art graph kernels were also used. The random walk ker- 
nel |21) did not finish computing within reasonable time. The shortest path 
kernel [5] was computed with edge number as path length, and the Weisfeiler- 
Lchman kernel [19] was computed with node degree as node label. Results are 
reported in Tables EJ [3] and II 



3 Experiments 

Analysis was performed on airway trees segmented from CT-scans of 1966 sub- 
jects from a national lung cancer screening trial. Triangulated mesh representa- 
tions of the interior and exterior wall surface were found using an optimal surface 
based approach [18] , and centerlines were extracted from the interior surface us- 
ing front propagation [14] . As the resulting centerlines are disconnected at bifur- 
cation points, the end points were connected using a shortest path search within 
an inverted distance map of the interior surface. The airway centerline trees 
were normalized using person height as an isotropic scaling parameter. Airway 
wall thickness and airway radius were estimated from the shortest distance from 
each surface mesh vertex to the centerline. The measurements were grouped and 
averaged along the centerline by each nearest landmark point. 

Out of the 1966 participants, 980 were diagnosed with COPD level 1-3 based 
on spirometry, and 986 were symptom free. The minimal/maximal/average num- 
ber of branches in an airway tree was 29/651/221.5, respectively. 



3.1 Kernel computation and computational time 

The kernels listed in table H] were implemented in Matlab and computed on a 
2.40GHz Intel Core i7-2760QM CPU with 32 GB RAM. Each kernel matrix was 
normalized to account for difference in tree size: 

if norm (71,72)= =£^^= =. (15) 

\J K(T 1 ,T 1 )K(T2,T 2 ) 

An exception was made for linear kernels between scalars (LBC and AAW%), 
since normalization such kernels results gives matrix coefficients = 1. 

Computation times for the different kernels used in the classification experi- 
ments in Section [3~3l on 1966 airway trees are shown in Table [4] To demonstrate 



3 Software: http : // image . diku . dk/aasa/sof tware . php ; published software was used 
for SP, WL 19, 



scalability, some of the kernels were ran on 9710 airways from a longitudinal 
study of the 1966 participants, see Tabled The slower kernels were not included. 

For classification and hypothesis testing, a set of 1966 airway trees from 1966 
distinct subjects was used (980 diagnosed with COPD at scan time). 

3.2 Hypothesis testing: Two-sample test for means 

Let X denote a set of data objects. Given any positive semidefinite kernel k : X x 
X there exists an implicitly defined feature map <f>: X — > % into a reproducing 
kernel Hilbert space (%, (•}) such that k(x\,X2) — (4>(xi),4>(x2)) for all x\,X2 €E 
X [5] . Hypothesis tests can be defined in H to check whether two samples A,Bc 
X are implicitly embedded by 4> into distributions on % that have, e.g., the same 
means \xa = Mb E3- Denote by (ia and (ib the sample means of 4>{A) and 4>{B) 
in H, respectively; we use as a test statistic the distance 

T(A,B) = \\fx A -{iB\\n 

between the sample means and check the null hypothesis using a permutation 
test. Writing |A| = a and \B\ = b, we divide X = A\JB into N random partitions 
Ai,Bi of size \Ai\ = a and \Bi\ = b, i = 1...N, compute the test statistic Ti 
for each partition, and compare it with the statistic To obtained for the original 
partition X = AlJB. An approximate p- value giving the probability of 4>{A) and 
<p(B) coming from distributions with identical means fiA = Ms i s now given by 
p = \{ T *\ T ^ T <^ l -i--- N }\+ 1 _ j 1 statistic can be computed from a kernel matrix 
since distances in % can be derived directly from the values of k(X, X) using 
the binomial formula: 

IIAa - AbII 2 = (I Eli <K<k) - i Ej=i Wi). - a Eli - 1 E U ^)) 
^E:=iEI=i(^)ik)) - £ESU 
+fE- =1 ELi(^)>^)) 
= ?ELiELiM a » a m) - ^E"=iEj b =i fc K> 6 j) 

+pE 3 liELi fc fe> ^rO- 
Using the test with selected kernels we show that healthy airways and COPD 
airways do not come from the same distributions (Table [3]) . 



Kernel 


Gaussian 


Gaussian 


Average 


Generation- 




pointcloud 


branchcount 


AW-wall % 


average AW-wall % 


p- value 


9.99 • l(T b 


9.99 • l(T b 


9.99 • 10" 6 


9.99 • lCT b 


Kernel 


Linear 


Linear 


Shortest 


Weisfeiler 




all-node-path 


Root-node-path 


path 


Lehman 


p- value 


9.99 • l(T b 


9.99 ■ KT 5 


9.99 • KT b 


9.99 ■ l(T b 



Table 3. Permutation tests for the means of the COPD patient and healthy subject 
samples. All permutation tests are made with 10.000 permutations. 



Kernel type 


Mean class, 
accuracy 


Kernel matrix 
computation time 


Mean class, 
accuracy 
K + Kqbc 


Rootpath, linear Q, (g]) 


62.4 ± 0.7% 


9 h 9 m 20 s 


66.8 ± 0.4% 


Rootpath, Gaussian |[3ll. ([4]) 


64.9 ± 0.4% 


6 h 53 m 21 s 


68.2 ± 0.5% 


All-node-paths, linear J2]), (J5]) 


62.0 ± 0.6% 


3 h 7 s 


63.2 ± 0.5% 


Root-node-path, linear @, (0 


61.8 ±0.7% 


4 m 24 s 


62.9 ± 0.8% 


Root-node-path, Gaussian Q, ((5j 


64.4 ± 0.8% 


97 h 21 m 45 s 


64.9 ± 0.6% 


Root-node-path, linear, (|3|, ([5]) 
airway wall area % attribute 


58.6 ± 0.6% 


19 m 44 s 


62.3 ± 0.8% 


Pointcloud, Gaussian (1101) 


64.4 ± 0.6% 


18 h 40 m 26 s 


66.5 ± 0.6% 


Branchcount, linear (1141) 


62.3 ± 1.0% 


0.08 s 


N/A 


Branchcount, Gaussian 


63.3 ± 0.4% 


0.2 


N/A 


Linear kernel on % 
average airway wall area (1121) 


56.2 ± 0.6% 


0.62 s 


63.3 ± 0.5% 


Gaussian kernel on average 
airway wall area %, 
generations 3 — 6 (1131) 


60.3 ± 0.2% 


0.35 s 


63.3 ± 0.5% 


Shortest path [3] 


62.6 ± 0.4% 


20 m 24 s 


63.4 ± 0.4% 


Weisfeiler Lehman (h — 10) 19 


62.1 ±0.5% 


14 m 40 s 


62.9 ± 0.5% 



Table 4. Classification results for COPD on 1966 individuals, of which 893 have COPD. 



3.3 COPD classification experiments 

Based on the kernel matrices corresponding to the kernels described in Sec. 13.11 
for a set of 1966 airway trees, classification into COPD/healthy was done using a 
support vector machine (SVM) [4]. The SVM slack parameter was trained using 
cross validation on 90% of the entire dataset, and tested on the remaining 10%. 
This experiment was repeated 10 times and the mean accuracies along with their 
standard deviations are reported in Table 3) All kernel matrices were combined 
with the GBC kernel matrix in order to check whether the kernels were, in fact, 
detecting something other than branch number. 

4 Discussion 

We have constructed a family of kernels that operate on geometric trees, and 
seen that they give a fast way to compare large sets of trees. We have applied the 
kernels to hypothesis testing and classification of COPD based on airway tree 
structure and geometry, along with state-of-the-art methods. We show that there 
is a connection between COPD and airway wall area percentage, and the COPD 
detected based on our weighted airway wall area percentage kernels is stronger 
than what can be found using average airway wall area percentage measurements 
over different airway tree generations, which is commonly done [101111] . 

Efficient kernels for trees with vector-valued node attributes are difficult to 
design because algorithmically, similarity of vector-valued attributes is more 
challenging to efficiently quantify than equality of discrete- valued attributes. 



Nevertheless, some of the defined kernels for vector-attributed trees are fast 
enough to be applied to large datasets from clinical trials. 

Vector- valued attributes are important from a modeling point of view, as they 
allow inclusion of geometric information such as branch shape or clinical mea- 
surements in the trees. However, there is a tradeoff between computational speed 
and optimal use of the attributes. The efficient node paths are less robust than 
the embedded paths in airway segmentations with missing or spurious branches, 
and we observe a small drop in classification performance in Table S) Rootpath 
kernels are introduced to improve computational speed. However, they do intro- 
duce a bias towards increased weighting of parts of the tree close to the root, 
which are contained in more root-paths. Gaussian local kernels perform signifi- 
cantly better than linear ones (Table [4]), which is particularly pronounced in the 
pointcloud kernel. In convolution kernels based on quantification of substructure 
similarity rather than isomorphic substructure, all the dissimilar substructures 
are still contributing to the total value of the kernel, and the Gaussian local ker- 
nel downscales the effect of dissimilar substructures much more efficiently than 
the linear kernel. This is particularly pronounced in kernels that use geometric 
weighting of airway wall measurement comparison. Unfortunately, however, al- 
gorithmic constructions like the Kronecker trick (Prop. [7]) do not work for the 
Gaussian kernels, which do not scale well to larger datasets. 

Using hypothesis tests for kernels we show that the healthy and COPD diag- 
nosed airway trees come from different distributions. Using SVM classification we 
show that COPD can be detected by kernels that depend on tree geometry, tree 
geometry attributed with airway wall area percentage measurements, or combi- 
natorial airway tree structure. Another efficient detector of COPD is the number 
of branches detected in the airway segmentation. It is thus important to clarify 
that our defined kernels are not just sophisticated ways of counting the detected 
branches. Combining the GBC kernel with the other kernels improves classifi- 
cation performance of the geometrically informed tree and pointcloud kernels, 
showing that these kernels must necessarily contain independent information, 
and the connection between COPD and airway shape is more than differences 
in detected airway branch numbers. In contrast, graph kernels that only use the 
tree structure are not significantly improved by combination with the branch 
count kernel. Future work includes efficient ways of computing all-paths kernels 
with linear node attributes, efficient kernels for trees with errors in them, as well 
replacing the Gaussian local kernels with more efficient RBF type kernels. 
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